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Field of the Invention 



The present invention relates to the encoding and decoding of 
audio signals and in particular to error concealment in digi- 
tal encoded audio signals. 



As a result of the increasingly widespread use of modern audio 
encoders and the corresponding audio decoders , which operate 
according to one of the MPEG standards, the transmission of 
encoded audio signals over radio networks or line-based net- 
works such as the internet has already become very important. 
The transmission channel involved in the transmission of en- 
coded audio signals by means of digital radio or over line- 
based networks is not ideal, which can result in encoded audio 
signals being adversely affected during the transmission. The 
decoder is therefore confronted with the question of how to 
deal with transmission errors, i.e. how these transmission er- 
rors are to be "concealed". The objective of error concealment 
is to manipulate transmission errors in such a way as to im- 
prove the subjective auditory sensation arising from such an 
error-afflicted decoded audio signal. 

Many error concealment methods are already known. The simplest 
type of error concealment is that of ^muting". When a decoder 
recognizes that data are missing or are erroneous, it inter- 
rupts the reproduction. The missing data are thus replaced by 
a zero signal. In this way the decoder is prevented from issu- 
ing sounds which, due to a transmission error, would be found 
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too loud or disconcerting. Because of psychoacoustic effects, 
however, the resulting sudden fall in the signal energy and 
its sudden rise when the decoder issues error-free data again 
is found disconcerting. 

Another known method which avoids the sudden fall and subse- 
quent rise in the signal energy is that of data repetition* If 
e.g. one or more blocks of audio data are missing, part of the 
data last transmitted are repeated in a loop until error-free, 
i.e. intact, audio data are available again. This method pro- 
duces disturbing artefacts, however. If only short parts of 
the audio signal are repeated, the repeated signal sounds me- 
chanical whatever the original signal may have been like, hav- 
ing a basic frequency equal to the repetition frequency. If 
longer parts are repeated, certain echo effects arise which 
are also found disturbing. 

In block-oriented transform encoders/decoders that employ a 
spectral representation of a temporal audio signal, the possi- 
bility would also exist of performing a spectral value predic- 
tion in the case of erroneous audio data. If it is established 
that spectral values in a block are erroneous, these spectral 
values can be predicted, i.e. estimated, on the basis of the 
spectral values of a preceding frame or a number of preceding 
frames. The predicted spectral values correspond within cer- 
tain limits to the erroneous spectral values if the audio sig- 
nal is relatively steady, i.e. if the audio signal is not sub- 
ject to any very fast changes in the signal envelope. If e.g. 
a method employing the MPEG AAC standard (ISO/IEC 13818-7 
MPEG-2 Advanced Audio Coding) ] is considered, a normal block 
or frame of encoded audio data has 1024 spectral values. For 
the method of spectral value prediction 1024 parallel operat- 
ing predictors will therefore be needed in the decoder so 
that, if a complete frame is lost, all the spectral values can 
be predicted. 
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A disadvantage of this method is the relatively high computa- 
tional effort, which makes a real-time decoding of a received 
multimedia or audio data signal impossible at present. 

A further important disadvantage of this method results from 
the transform algorithm, namely the modified discrete cosine 
transform (MDCT) ] , which is used. It is generally known that 
the MDCT algorithm does not provide an ideal Fourier spectrum 
but a "spectrum" which deviates from an ideal Fourier spec- 
trum. Investigations have shown that a sine time function 
e.g., which has a Fourier spectrum with a single spectral line 
at the frequency of the sine function, has an MDCT "spectrum" 
which, while it has a dominant spectral coefficient at the 
frequency of the sine function, also has in addition further 
spectral coefficients at other frequency values. Furthermore, 
the height of an MDCT "spectrum" of a sine function does not 
remain the same from one frame to another but varies from 
frame to frame. Another fact is that the MDCT transform is not 
strictly energy conserving. What can be stated, therefore, is 
that, while the MDCT transform works exactly in conjunction 
with an inverse MDCT transform, the MDCT spectrum differs con- 
siderably from a Fourier spectrum. A spectral value prediction 
of MDCT spectral coefficients has thus shown itself to be in- 
adequate when high precision is required. 

A further disadvantage of spectral value prediction, particu- 
larly in connection with modern audio coding methods, is that 
modern audio coding methods use different window lengths or 
window shapes. To prevent the quantization noise arising from 
the quantization of the MDCT spectral coefficients being 
"smeared" over a long block, i.e. the occurrence of pre- 
echoes, when there are rapid changes (transients or "at- 
tacks") ] in the audio signal to be encoded, modern transform 
encoders use short windows for transient audio signals, i.e. 
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audio signals with "attacks", to increase the temporal resolu- 
tion at the expense of the frequency resolution. This means, 
however, that for a spectral value prediction both the window 
length and the window shape {in addition there are transition 
windows to initiate windowing from short to long blocks and 
vice versa)] must be constantly taken into account, which also 
increases the complexity of the spectral value prediction and 
would greatly affect the computational efficiency. 

DE 40 34 017 Al relates to a method for detecting errors in 
the transmission of frequency coded digital signals. From the 
frequency coefficients or previous and, in some cases, future 
frames, an error function is formed on the basis of which the 
occurrence of an error can be detected. An erroneous frequency 
coefficient is no longer included in the evaluation of subse- 
quent frames. 

DE 197 35 675 Al discloses a method for concealing errors in 
an audio data stream. The spectral energy of a subgroup of in- 
tact audio data is calculated. After producing a pattern for 
substitute data using the spectral energy calculated for the 
subgroup of intact audio data, substitute data for erroneous 
or missing audio data corresponding to the subgroup are gener- 
ated according to the pattern. 

Summary of the Invention 

It is the object of the present invention to provide precise 
and flexible error concealment for audio signals which can be 
implemented with limited computational effort and an error- 
tolerant and flexible decoding of audio signals. 
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In accordance with a first aspect of the present invention, 
this object is achieved by a method for concealing an error in 
an encoded audio signal, where the encoded audio signal has 
successive sets of spectral coefficients, where a set of spec- 
tral coefficients is a spectral representation for a set of 
audio sampled values, comprising the following steps: subdi- 
viding a current set of spectral coefficients into at least 
two sub-bands with different frequency ranges, where one sub- 
band of the at least two sub-bands has at least two spectral 
coefficients; reverse transforming the spectral coefficients 
of the one sub-band to obtain a temporal representation of the 
at least two spectral coefficients of the one sub-band; per- 
forming a prediction using the temporal representation of the 
at least two spectral coefficients of the one sub-band to ob- 
tain an estimated temporal representation for a sub-band of a 
set following the current set, where the sub-band of the fol- 
lowing set has the same frequency range as the sub-band of the 
current set; forward transforming the estimated temporal rep- 
resentation to obtain at least two estimated spectral coeffi- 
cients for the sub-band of the following set; determining 
whether a spectral coefficient of the sub-band of the follow- 
ing set is erroneous; and as reaction to the step of determin- 
ing, if there is an erroneous spectral coefficient, using an 
estimated spectral coefficient instead of an erroneous spec- 
tral coefficient of the following set so as to conceal the er- 
roneous spectral coefficient of the following set. 

In accordance with a second aspect of the present invention, 
this object is achieved by a method for decoding an encoded 
audio signal which comprises successive sets of spectral coef- 
ficients, wherein a set of spectral coefficients is a spectral 
representation for a set of audio sampled values: receiving a 
current set of spectral coefficients; subdividing a current 
set of spectral coefficients into at least two sub-bands with 
different frequency ranges, where one sub-band of the at least 



6 



two sub-bands has at least two spectral coefficients; reverse 
transforming the spectral coefficients of the one sub-band to 
obtain a temporal representation of the at least two spectral 
coefficients of the one sub-band; performing a prediction us- 
ing the temporal representation of the at least two spectral 
coefficients of the one sub-band to obtain an estimated tempo- 
ral representation for a sub-band of a set following the cur- 
rent set, where the sub-band of the following set has the same 
frequency range as the sub-band of the current set; forward 
transforming the estimated temporal representation to obtain 
at least two estimated spectral coefficients for the sub-band 
of the following set; receiving a following set of spectral 
coefficients and subdividing the following set into sub-bands 
which cover the same frequency range as the sub-bands of the 
current set; determining whether a spectral coefficient of the 
sub-band of the following set is erroneous; as reaction to the 
step of determining, if there is an erroneous spectral coeffi- 
cient, using an estimated spectral coefficient instead of an 
erroneous spectral coefficient of the following set so as to 
conceal the erroneous spectral coefficient of the following 
set; and processing the following set using the estimated 
spectral coefficient used in the step of using to obtain the 
following set of audio sampled values. 

In accordance with a third aspect of the present invention, 
this object is achieved by a device for concealing an error in 
an encoded audio signal, where the encoded audio signal has 
successive sets of spectral coefficients, where a set of spec- 
tral coefficients is a spectral representation for a set of 
audio sampled values, comprising: a unit for subdividing a 
current set of spectral coefficients into at least two sub- 
bands with different frequency ranges, where one sub-band of 
the at least two sub-bands has at least two spectral coeffi- 
cients; a unit for reverse transforming the spectral coeffi- 
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cients of the one sub-band to obtain a temporal representation 
of the at least two spectral coefficients of the one sub-band; 
a unit for performing a prediction using the temporal repre- 
sentation of the at least two spectral coefficients of the one 
sub-band to obtain an estimated temporal representation for a 
sub-band of a set following the current set, where the sub- 
band of the following set has the same frequency range as the 
sub-band of the current set; a unit for forward transforming 
the estimated temporal representation to obtain at least two 
estimated spectral coefficients for the sub-band of the fol- 
lowing set; a unit for determining whether a spectral coeffi- 
cient of the sub-band of the following set is erroneous; and a 
unit for using an estimated spectral coefficient instead of an 
erroneous spectral coefficient of the following set so as to 
conceal the erroneous spectral coefficient of the following 
set . 

In accordance with a fourth aspect of the present invention, 
this object is achieved by a device for decoding an encoded 
audio signal which comprises successive sets of spectral coef- 
ficients, where a set of spectral coefficients is a spectral 
representation for a set of audio sampled values, comprising: 
a unit for receiving a current set of spectral coefficients; 

a unit for subdividing a current set of spectral coefficients 
into at least two sub-bands with different frequency ranges, 
where one sub-band of the at least two sub-bands has at least 
two spectral coefficients; a unit for reverse transforming the 
spectral coefficients of the one sub-band to obtain a temporal 
representation of the at least two spectral coefficients of 
the one sub-band; a unit for performing a prediction using the 
temporal representation of the at least two spectral coeffi- 
cients of the one sub-band to obtain an estimated temporal 
representation for a sub-band of a set following the current 
set, where the sub-band of the following set has the same fre- 
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quency range as the sub-band of the current set; a unit for 
forward transforming the estimated temporal representation to 
obtain at least two estimated spectral coefficients for the 
sub-band of the following set; a unit for receiving a follow- 
ing set of spectral coefficients and for subdividing the fol- 
lowing set into sub-bands which cover the same frequency range 
as the sub-bands of the current set; a unit for determining 
whether a spectral coefficient of the sub-band of the follow- 
ing set is erroneous; a unit for using an estimated spectral 
coefficient instead of an erroneous spectral coefficient of 
the following set so as to conceal the erroneous spectral co- 
efficient of the following set; and a unit for processing the 
following set using the estimated spectral coefficient to ob- 
tain the following set of audio sampled values. 

The present invention is based on the finding that the disad- 
vantages of the spectral value prediction, which reside in the 
dependence on the transform algorithm which is used and in the 
dependence on the window shape and block length, can be 
avoided by performing error concealment by means of a predic- 
tion which functions in the "quasi" time domain. To this end a 
set of spectral values which preferably corresponds to a long 
block or a number of short blocks is subdivided into sub- 
bands. A sub-band of the current set of spectral coefficients 
can then undergo a reverse transform so as to obtain a time 
signal corresponding to the spectral coefficients of the sub- 
band. To generate estimated values for a subsequent set of 
spectral coefficients, a prediction is performed on the basis 
of the time signal of this sub-band. 

It should be noted that this prediction takes place in the 
quasi time domain since the temporal signal on the basis of 
which the prediction is performed is simply the time signal of 
one sub-band of the encoded audio signal and not the time sig- 
nal of the whole spectrum of the audio signal. The time signal 
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generated by prediction is subjected to a forward transform to 
obtain estimated, i.e. predicted, spectral coefficients for 
the sub-band of the following set of spectral coefficients. If 
it now established that there are one or more erroneous spec- 
tral coefficients in the following set of spectral coeffi- 
cients, the erroneous spectral coefficients can be replaced by 
the estimated, i.e. predicted, spectral coefficients. 

Compared to the pure spectral value prediction, the method ac- 
cording to the present invention for error concealment re- 
quires less computational effort since, as the spectral coef- 
ficients have been grouped together, predictions now have to 
be performed only for each sub-band and no longer for each 
spectral coefficient. Furthermore, the method according to the 
present invention provides a high degree of flexibility since 
the characteristics of the signals to be processed can be 
taken into account. 

The noise substitution according to the present invention 
works particularly well for tonal signals. It has been discov- 
ered, however, that tonal signal portions are more likely to 
appear in the lower-frequency range of the spectrum of an au- 
dio signal, while the higher-frequency signal portions are 
more likely to be unsteady, i.e. noisy. In terms of the pre- 
sent description, "noisy signal portions" are signal portions 
which are far from steady. These noisy signal portions do not 
have to represent noise in the classical sense, however, but 
simply rapidly changing user signals. 

To enable the computational effort to be reduced still fur- 
ther, it is possible with the present invention to subject 
only the lower-frequency signal portions to a prediction 
whereas higher-frequency signal portions are not processed at 
all. In other words, it is possible to subject only the low- 
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est/lower sub-band(s)] to a reverse transform, a prediction 
and a forward transform. 

This characteristic of the present invention, in contrast to a 
complete transforming of the whole audio signal into the time 
domain and a prediction of the whole temporal audio signal 
from block to block using a so-called "long-terra" predictor, 
constitutes a considerable advantage, since according to the 
present invention the advantages of prediction in the time do- 
main are combined with the advantages of spectral decomposi- 
tion. Only with spectral decomposition is it possible to take 
account of audio signal characteristics which depend on the 
frequency. The number of sub-bands generated from the subdivi- 
sion of the set of spectral coefficients is arbitrary. If only 
two sub-bands are chosen, the advantage of considering the to- 
nality already manifests itself in the lower frequency range 
of the audio signal. If on the other hand many sub-bands are 
chosen, the predictor in the quasi time domain will have a 
relatively short length such that its delay doesn't become too 
large. Since the individual sub-bands are preferably processed 
in parallel, an embodiment of the present invention using a 
hard-wired integrated circuit would require a plurality of 
predictor circuits in parallel. 

If the present invention is employed in connection with a 
transform encoder which uses different block lengths, the ad- 
vantage results that the predictor itself is independent of 
block length and window shape. In addition, due to the reverse 
transform, the dependence on the transform algorithm used, ex- 
plained above in relation to the MDCT, is eliminated. Further- 
more, the concept according to the present invention for error 
concealment furnishes estimated spectral coefficients which, 
due to the reverse transform, the prediction in the time do- 
main and the forward transform, have the right phase, i.e. 
there are no phase jumps in the time signal resulting from a 
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predicted spectral coefficient in relation to a time signal of 
a preceding intact set of spectral coefficients. As a result 
tonal signals can be substituted for erroneous or missing 
signal portions so well that a normal listener does not even 
realize in most cases that an error has occurred. 

Finally, the method according to the present invention is par- 
ticularly suited for combination with an error concealment 
technique described in DE 197 35 675 Al, which is suitable for 
the substitution of noisy signal portions. If tonal signal 
portions of a missing block are concealed by means of the 
method according to the present invention, and if noisy signal 
portions are combined by means of the known method which has 
just been cited, which is based on an energy similarity be- 
tween substituted data and intact data, completely missing 
blocks can be concealed to such an extent as to be practically 
inaudible for a normal listener. 

Brief Description of the Drawings 

Preferred embodiments of the present invention are described 
in detail below making reference to the enclosed drawings, in 
which 

Fig. 1 shows a decoder having an error concealment unit ac- 
cording to the present invention; 

Fig. 2 shows a detailed block diagram of the error conceal- 
ment unit of Fig. 1; 

Fig. 3 shows a detailed block diagram of the error conceal- 
ment unit of Fig. 1 which also provides noise substitu- 
tion and which works according to the prediction gain; 
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Fig, 4 shows a flowchart for the method for error concealment 
according to the present invention; 

Fig. 5 shows a detailed block diagram of a preferred embodi- 
ment of the error concealment unit for an MPEG-2 AAC 
decoder; 

Fig. 6 shows a detailed block diagram of the predictor of 
Fig. 5; and 

Fig. 7 shows a schematic representation of the block struc- 
ture according to the AAC standard. 

Detailed Description of Preferred Embodiments 

Fig. 1 shows a block diagram of a decoder according to a pre- 
ferred embodiment of the present invention. The decoder block 
diagram shown in Fig. 1 corresponds essentially to the MPEG-2 
AAC decoder as defined in the standard MPEG-2 AAC 13818-7. The 
encoded audio signal is first fed into a bit stream demulti- 
plexer 100 in order to separate spectral data and side infor- 
mation. The Huffman coded spectral coefficients are then fed 
into a Huffman decoder 200 so as to obtain quantized spectral 
values from the Huffman code words. The quantized spectral 
values are then fed into an inverse quantizer 300 and the re- 
spective scale factor bands are then multiplied by appropriate 
scale factors. The decoder according to the present invention 
can incorporate a plurality of additional functional units 
following the inverse quantizer 300 , e.g. a middle/side stage, 
a predictor stage, a TNS stage, etc., as specified in the 
standard. 

According to a preferred embodiment of the present invention 
the decoder includes an error concealment unit 500 which imme- 
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diately precedes a synthesis filter bank 400 and which func- 
tions according to the present invention and which ensures 
that the effects of transmission errors in the encoded audio 
signal fed into the bit stream demultiplexer 100 can be miti- 
gated or made completely inaudible. In other words, the error 
concealment unit 500 ensures that transmission errors are con- 
cealed, i.e. that they are not or are only faintly audible in 
a temporal audio signal at the output of the synthesis filter 
bank. 

Fig. 2 shows a general block diagram of the error concealment 
unit 500. This includes a reverse transform unit 502, a unit 
504 for generating estimated values and a forward transform 
unit 506. Both the reverse transform unit 502 and the forward 
transform unit 506 can be controlled according to the current 
block type via a block type line 508. The error concealment 
unit 500 also includes a parallel branch which enables the 
spectral coefficients on the input side to be routed directly 
from the input to the output bypassing the reverse transform 
unit 502, the unit for generating estimated values 504 and the 
forward transform unit 506. This parallel branch contains a 
time delay stage 510 so as to ensure that estimated spectral 
coefficients for a subsequent block which appear behind the 
forward transform unit 506 arrive at an error selection unit 
512 simultaneously with "real", possibly erroneous spectral 
coefficients for the subsequent block, so that it is possible 
to replace any erroneous spectral coefficients in the real 
spectral coefficients for the subsequent block by estimated 
spectral coefficients for the subsequent block. This spectral 
value replacement is represented in Fig. 2 by a switch symbol 
512. It should be noted that the error replacement unit 512 
can operate on a spectral value level, or on a block or set 
level. Depending on the requirements, it can also operate on 
the sub-band level. The subsequent set of spectral coeffi- 
cients, wherein any originally erroneous spectral coefficients 
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have been replaced by estimated spectral coefficients, i.e. 
wherein errors have been concealed, thus appears at the output 
of the error replacement unit 512. 

It should be pointed out here that the block diagram shown in 
Fig. 2 represents only a part of the error concealment unit 
500. This representation has however been chosen for reasons 
of clarity. As will be described in more detail in Fig. 5 with 
reference to a preferred embodiment of the present invention, 
the circuit shown in Fig. 2 is preceded by a unit for subdi- 
viding into sub-bands. As a counterpart thereto, the error re- 
placement unit 512 is followed by a unit for cancelling the 
subdivision into sub-bands so that the filter bank 400 (Fig. 
1) ] receives a "normal" set of spectral coefficients without 
noticing anything about the preceding error concealment. The 
error concealment unit 500 (Fig. 1)] thus includes a plurality 
of the circuits described with reference to Fig. 2, namely one 
circuit per sub-band. The parallel circuits are connected on 
the input side by the unit for subdividing and on the output 
side by the unit for cancelling the subdivision, as will be 
described in detail later. 

It has already been pointed out that modern transform encoders 
use short windows so as to increase the temporal resolution in 
the event of transients in an audio signal which is to be en- 
coded. Here it is usually the case that the number of temporal 
sampled values or the number of spectral coefficients in a 
long window or block is an integral multiple of the number of 
temporal sampled values or the number of spectral coefficients 
in a short window or block. An advantage of the present inven- 
tion is that the unit 504 for generating estimated values can 
operate independently of the transform, the block length and 
the window type which are used. Both the reverse transform 
unit 502 and the forward transform unit 506 are therefore con- 
trolled according to the block type so that the same number of 
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temporal scanned values is always presented to or emerges from 
the unit 504 for generating estimated values. 

This property will now be illustrated further by making use of 
Fig. 7 to represent the situation for MPEG-2 AAC. Fig. 7 has a 
time axis 700 in terms of which the extent of a long block 702 
is represented. A long block comprises 2048 sampled values, 
resulting in 1024 spectral coefficients if the windows overlap 
by 50% as is known. Background details of the modified dis- 
crete cosine transform (MDCT) ] which is used and window over- 
lapping are to be found in the already cited standard. In Fig. 
7 eight short blocks 704 are also depicted, each of which has 
256 sampled values, again resulting in 128 spectral coeffi- 
cients due to the 50% overlap. For reasons of clarity, the 
overlapping of the short blocks and the overlapping of the 
long block with a preceding long block or with a preceding or 
subsequent start or stop window have not been shown in Fig. 7. 
However, it is clear from Fig. 7 that the number of spectral 
coefficients in a long block is equal to eight times the num- 
ber of spectral coefficients in a short block. Put another 
way, a long block encompasses the same time duration of the 
audio signal as do eight short blocks. 

As is shown in Fig. 2, the reverse transform unit 502 is con- 
trolled via the block type line 508 in such a way that it per- 
forms eight successive reverse transforms of the spectral co- 
efficients in the corresponding sub-bands of short blocks and 
arranges the resulting quasi time signals serially next to one 
another so as to provide the unit 504 for generating estimated 
values with a time signal of a certain length. As a counter- 
part to this, the forward transform unit 506 will also perform 
eight successive forward transforms on the values which are 
issued serially by the unit 504 for generating estimated val- 
ues. This "operating cycle" thus ensures that in the case of 
short blocks the same number of spectral coefficients is out- 
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put as in the case of long blocks. The spectral coefficients 
which are output by the error concealment unit 500 in an "op- 
erating cycle" are termed a set of estimated spectral coeffi- 
cients in the sense of the present invention. On the grounds 
of practicability the number of spectral coefficients in a set 
is the same as the number of spectral coefficients in a long 
block and the number of spectral coefficients in eight short 
blocks. It is obvious that other ratios between long and short 
block can be chosen, e.g. 2, 4 or 16. Normally the situation 
will be such that the number of spectral coefficients in a 
long block will be divisible by the number of spectral coeffi- 
cients in a short block. Should this not be so for some rea- 
son, however, the number of spectral coefficients in a set 
would be equal to the least common multiple of long and short 
blocks so as to achieve independence from the block type at 
the predictor level, i.e. in the unit 504 for generating esti- 
mated values. 

Fig. 3, which represents a preferred development of the error 
concealment unit of Fig. 2, will now be considered. An impor- 
tant feature here is that the error concealment unit has been 
provided with a noise replacement unit 514 which, in place of 
the forward transform unit 506, can be connected to the error 
replacement unit via a noise replacement switch 518 depending 
on a prediction gain signal 516. The noise replacement unit 
514 operates according to the method described in DE 197 35 
675 Al so as to approximate noisy signal content. Since noisy 
signal content is involved, the phase of the spectral coeffi- 
cients is no longer considered but simply the energy of a num- 
ber of spectral coefficients in a subgroup. Depending on the 
energy in a subgroup of the last intact audio data, the noise 
replacement unit 514 generates a corresponding subgroup of 
spectral coefficients, the energy in the subgroup of generated 
spectral coefficients equalling the energy of the correspond- 
ing subgroup of the preceding spectral coefficients or being 
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derived from it. The phases of the spectral coefficients gen- 
erated in the noise replacement process are, however, speci- 
fied randomly. 

The noise replacement switch 518 is controlled by a prediction 
gain signal 516. In general the prediction gain depends on the 
way the output signal of the unit 504 for generating estimated 
values relates to the input signal. If it is found that the 
output signal in a sub-band is substantially the same as the 
input signal, it can be assumed that the audio signal in this 
sub-band is relatively steady, i.e. tonal. If, on the other 
hand, the output signal of the predictor differs markedly from 
the input signal, it can be assumed that the audio signal in 
this sub-band is relatively unsteady, i.e. atonal or noisy. In 
this case a noise replacement will provide better results than 
a prediction since noisy signals cannot per se be reliably 
predicted. The noise replacement switch 518 could, for exam- 
ple, be so controlled that it connects the forward transform 
unit 506 to the error replacement unit 512 when the prediction 
gain exceeds a certain threshold and connects the noise re- 
placement unit 514 to the error replacement unit 512 when the 
prediction gain does not exceed this threshold, thus combining 
the two substitution methods in an optimal way. 

The method of noise substitution according to the present in- 
vention will now be considered in more detail making reference 
to Fig. 4. First, a current set of spectral coefficients is 
received (10)]. For reasons of clarity it is assumed in Fig. 4 
that the current set of spectral coefficients consists en- 
tirely of intact spectral coefficients or has already been 
subjected to a error concealment method as shown in Fig. 2 or 
Fig. 3. On the one hand the current set of spectral coeffi- 
cients is processed by the filter bank 400 (Fig. 1) ] and out- 
put e.g. to a loudspeaker (12)]. On the other hand the current 
set of spectral coefficients is used to predict or estimate a 
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subsequent set of spectral coefficients. To achieve this 
according to the present invention the current set of spectral 
coefficients is subdivided into sub-bands (14)]. In the case 
of a long block the subdivision into sub-bands is effected by 
generating just one sub-band with a corresponding frequency 
range for each set. In the case of short blocks the current 
set of spectral coefficients will consist of a plurality of 
successive complete spectra. Then, in step 14, corresponding 
sub-bands are generated for each complete spectrum, i.e. a 
plurality of sub-bands for each set of spectral coefficients. 

After subdivision into sub-bands a reverse transform is per- 
formed for each sub-band (16) ] . In the case of long blocks, 
where the number of spectral coefficients in a block is equal 
to the number of spectral coefficients in a set, a single re- 
verse transform is performed for each sub-band prior to the 
prediction 18. In the case of short blocks several reverse 
transforms corresponding to the sub-bands of each "short" 
spectrum are performed before a prediction 18 is effected for 
all the sub-bands together. 

The prediction 18 takes place in the quasi time domain, i.e. 
for each sub-band "time" signal, so as to obtain an estimated 
sub-band time signal for the subsequent set. This estimated 
quasi time signal is then subjected to a forward transform 20, 
again once only for a long block and N times for short blocks, 
N being the ratio of the number of spectral coefficients of a 
long block to the number of spectral coefficients of a short 
block. 

After step 20 estimated spectral coefficients are available 
for each sub-band. In a step 22 the subdivision introduced in 
step 14 is revoked again so that a subsequent set of spectral 
coefficients is obtained after step 22. 
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In a step 24 the subsequent set of spectral coefficients is 
received by the decoder. This set undergoes error detection 26 
in order to establish whether one spectral coefficient, sev- 
eral spectral coefficients or all spectral coefficients of the 
subsequent set are erroneous. The error detection is effected 
in a way which is known to persons skilled in the art, e.g. by 
checking the CRC checksum (CRC = Cyclic Redundancy Code)] over 
a block. If it is found that a checksum that is calculated on 
the basis of the transmitted data differs from the checksum 
transmitted with the data, the estimated spectral coefficients 
generated by step 22 can be adopted instead of the spectral 
coefficients of the erroneous block. The erroneous spectral 
coefficients are thus replaced by the estimated spectral coef- 
ficients (28)]. Finally the error-concealed spectral coeffi- 
cients of the subsequent set are processed so as to be able to 
output the temporal sampled values (30)]. 

The flowchart of Fig. 4 essentially represents a snapshot of 
the processing which takes place from one set of spectral co- 
efficients to the next set of spectral coefficients. If the 
flowchart of Fig. 4 is implemented it is obvious that e.g. 
only a single filter bank 400 (Fig. 1)] is used to perform the 
steps 12 and 30. Equally, it is obvious that only a single 
unit is needed to receive the current set of spectral coeffi- 
cients and to receive the subsequent set of spectral coeffi- 
cients to implement the steps 10 and 24. Temporal synchronic- 
ity for the steps 10 and 24 in a device which implements the 
method according to the present invention is ensured by the 
time delay stage 510 in the parallel branch (Fig. 2) ] . 

Fig. 5 shows a more detailed representation of the general 
block diagram of Fig. 2 for the example of an MPEG-2 AAC 
transform encoder featuring the error concealment unit 500 ac- 
cording to the present invention. As has already been ex- 
plained with reference to Fig. 2, the error concealment unit 
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500 (Fig. 1)] includes a unit 520 for subdividing the blocks 
of spectral coefficients into, preferably, 32 sub-bands. In 
the case of long blocks each sub-band has 32 spectral coeffi- 
cients. Since the sub-bands of the short blocks span the same 
frequency range, each sub-band has 4 spectral coefficients in 
the case of short blocks. A subdivision of a complete spectrum 
into sub-bands of the same size is preferred on the grounds of 
simplicity, though a subdivision into unequal sub-bands would 
also be possible, e.g. to reflect the psychoacoustical fre- 
quency groups. Each sub-band is then subjected to an inverse 
modified discrete cosine transform. In the case of long blocks 
the IMDCT is performed once and receives 32 input values. In 
the case of short blocks eight successive IMDCTs are per- 
formed, each with 4 of the spectral coefficients, so that 32 
quasi time sampled values again result at the output. These 
are then passed on to the predictor 504, which in turn gener- 
ates 32 estimated quasi time sampled values which are trans- 
formed by the MDCT 506. In the case of long blocks a single 
MDCT is performed with 32 temporal values, whereas in the case 
of short blocks eight successive MDCTs are performed, each 
having 4 sampled values. Although only one branch for the 0-th 
sub-band is shown in Fig. 5, it should be noted that an iden- 
tical branch exists for each sub-band if all the sub-bands are 
of the same length. If the sub-bands are of different lengths, 
the orders of the IMDCT or MDCT are adapted accordingly. For 
the purposes of a practical implementation an obvious choice 
is parallel processing. Obviously, however, serial processing 
of the sub-bands is also possible, if sufficient storage ca- 
pacity is available. The output values of the MDCT 506 for 
each sub-band are fed to a unit 522 for reversing the subdivi- 
sion, i.e. into an inverse subdivision unit, so as to output 
an estimated set of spectral values for the preferred embodi- 
ment at the AAC MDCT level. 
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Fig. 6 shows a further detailed representation of the predic- 
tor 504. The heart of the predictor 504 in the preferred em- 
bodiment is a so-called LMSL predictor 504a with a length of n 
= 32. Details of the LMSL predictor can be found in the book 
M Adaptive Signal Processing", Bernard Widrow, Samuel Stearns, 
Prentice-Hall, 1995, p. 99 ff . The LMSL predictor 504a is pre- 
ceded by a time delay stage 504b. The predictor 504 also in- 
cludes a parallel-series converter 504c on the input side and 
a series-parallel converter 504d on the output side. It also 
has a prediction gain calculator 504e which compares the out- 
put signal of the predictor 504a with the input signal in or- 
der to establish whether a steady signal or an unsteady signal 
has been processed. On the output side the prediction gain 
calculator 504e supplies the prediction gain signal 516, which 
is used to control the switch 518 {Fig. 3) ] so as to employ 
either predicted spectral coefficients or spectral coeffi- 
cients gained by noise substitution for the purposes of error 
concealment. In its implementation as LMSL predictor the pre- 
dictor 504 also includes two switches 504f and 504g, which 
have two switch settings. The switch setting "1" applies when 
the spectral coefficients of the subsequent block are error- 
free and the switch setting ss 2" applies when the spectral co- 
efficients of the subsequent set are erroneous. Fig. 6 shows 
the case where the spectral coefficients are erroneous. In 
this case a reference signal with a value of 0 is fed into the 
predictor at the switch 504g instead of the input signal. In 
the case of error-free spectral coefficients (switch setting 
"1" of the switch 504g) ] , on the other hand, the output values 
of the parallel-series converter are fed into the LMSL predic- 
tor from below. 

If the error concealment method according to the present in- 
vention is used in connection with an AAC encoder, the pre- 
ferred option is to use the corresponding transform algorithms 
(MDCT or IMDCT) ] for all the forward and reverse transforms. 
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For error concealment it is not, however , necessary that the 
same transform method is employed for the reverse or forward 
transform as was used when encoding the audio signal to form 
the spectral coefficients. 

Due to the subdivision of the spectrum into sub-bands and due 
to the individual transforms for each sub-band, frequency-time 
domain transforms of lower order than the frequency resolution 
are used appropriately for each sub-band. As a result special 
estimated values for tonal signal portions are generated in 
the intermediate level by means of the predictor. Time- 
frequency domain transforms of lower order than the original 
frequency resolution are used appropriately as forward trans- 
form/synthesis, the same order being chosen as for the fre- 
quency-time domain transform which is used. Thus error con- 
cealment according to the present invention provides flexibil- 
ity through using advance knowledge of the spectral properties 
of audio signals and also independence from the transform 
method used in the encoder through the generation of estimated 
values in the quasi time signal, i.e. not at the spectral co- 
efficient level. If the prediction in the quasi time domain is 
used to replace tonal signal portions and if the noise re- 
placement is used for noisy spectral portions, errors for a 
large class of audio signals can be concealed to such an ex- 
tent that, even in the case of complete block loss, there is 
practically no audible disturbance. Trials have shown that, 
for not too critical test signals, normal listeners, i.e. un- 
trained test listeners, have heard irregularities in the audio 
signal only in one case out of 10 even when there has been 
complete block loss. 
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Claims 

1. A method for concealing an error in an encoded audio sig- 
nal, where the encoded audio signal has successive sets 
of spectral coef f icients , where a set of spectral coeffi- 
cients is a spectral representation for a set of audio 
sampled values, comprising the following steps: 

subdividing a current set of spectral coefficients into 
at least two sub-bands with different frequency ranges, 
where one sub-band of the at least two sub-bands has at 
least two spectral coefficients; 

reverse transforming the spectral coefficients of the one 
sub-band to obtain a temporal representation of the at 
least two spectral coefficients of the one sub-band; 

performing a prediction using the temporal representation 
of the at least two spectral coefficients of the one sub- 
band to obtain an estimated temporal representation for a 
sub-band of a set following the current set, where the 
sub-band of the following set has the same frequency 
range as the sub-band of the current set; 

forward transforming the estimated temporal representa- 
tion to obtain at least two estimated spectral coeffi- 
cients for the sub-band of the following set; 

determining whether a spectral coefficient of the sub- 
band of the following set is erroneous; and 

as reaction to the step of determining, if there is an 
erroneous spectral coefficient, using an estimated spec- 
tral coefficient instead of an erroneous spectral coeffi- 



cient of the following set so as to conceal the erroneous 
spectral coefficient of the following set. 

A method according to claim 1, wherein the one sub-band 
that is processed in the step of reverse transforming has 
low-frequency spectral coefficients and the other of the 
at least two sub-bands has higher-frequency spectral co- 
efficients . 

A method according to claim 1, wherein the number of 
spectral coefficients in a set of spectral coefficients 
is equal to the number of spectral coefficients in a 
block of the first length and is N times the number of 
spectral coefficients in a block of the second length, 
and wherein N blocks of the second length follow each 
other, where 

the step of subdividing is performed in such a way that 
the sub-bands of the blocks of the first length have the 
same frequency ranges as the sub-bands of the blocks of 
the second length, so that the number of spectral coeffi- 
cients of a sub-band of the block of the first length is 
equal to N times the number of spectral coefficients of 
the corresponding sub-band of the block of the second 
length; 

the step of reverse transforming is performed in succes- 
sion for each corresponding sub-band of the N blocks of 
the second length to obtain a temporal representation of 
the spectral coefficients of the corresponding sub-bands 
of the N blocks of the second length; 

the step of performing a prediction is effected with the 
temporal representation of all the corresponding sub- 
bands of the N blocks of the second length; and 
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the step of forward transforming is performed succes- 
sively for each corresponding sub-band of the N blocks of 
the second length. 

4. A method according to claim 1, wherein a plurality of 
sub-bands is generated in the step of subdividing such 
that all the sub-bands together form the spectral repre- 
sentation of the encoded audio signal in a set of spec- 
tral coefficients . 

5. A method according to claim 1, wherein the following step 
is performed after the step of determining whether a 
spectral coefficient of a sub-band is erroneous: 

determining whether the spectral coefficient represents a 
tonal portion of the uncoded audio signal by comparing 
the spectral coefficient with the corresponding estimated 
spectral coefficient; 

if the spectral coefficient is found to be tonal, using 
the estimated spectral coefficient, and, if the spectral 
coefficient is found to be non-tonal, performing a noise 
substitution for an erroneous spectral coefficient of the 
following set. 

6. A method according to claim 3, wherein the spectral coef- 
ficients are MDCT coefficients, the length of a set cor- 
responds to the length of a long block and has 1024 MDCT 
coefficients, while a set of spectral coefficients com- 
prises eight short-length blocks, each with 128 MDCT co- 
efficients, and wherein 32 sub-bands, each with 32 MDCT 
coefficients for a long block or each with 4 MDCT coeffi- 
cients for a short block, are formed in the step of sub- 
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dividing. 

7. A method according to claim 1, wherein an adaptive back- 
coupled predictor, preferably an LMSL predictor, is used 
in the step of performing the prediction. 

8. A method according to claim 1, wherein the transform al- 
gorithm which forms the basis of the encoded audio signal 
is the same transform algorithm that is used in the step 
of reverse transforming and in the step of forward trans- 
forming . 

9. A method according to claim 1, wherein the transform al- 
gorithm which is used in the step of reverse transforming 
is the exact inverse of the transform algorithm that is 
used in the step of forward transforming. 

10. A method for decoding an encoded audio signal which com- 
prises successive sets of spectral coefficients, wherein 
a set of spectral coefficients is a spectral representa- 
tion for a set of audio sampled values: 

receiving a current set of spectral coefficients; 

subdividing a current set of spectral coefficients into 
at least two sub-bands with different frequency ranges, 
where one sub-band of the at least two sub-bands has at 
least two spectral coefficients; 

reverse transforming the spectral coefficients of the one 
sub-band to obtain a temporal representation of the at 
least two spectral coefficients of the one sub-band; 

performing a prediction using the temporal representation 
of the at least two spectral coefficients of the one sub- 
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band to obtain an estimated temporal representation for a 
sub-band of a set following the current set, where the 
sub-band of the following set has the same frequency 
range as the sub-band of the current set; 

forward transforming the estimated temporal representa- 
tion to obtain at least two estimated spectral coeffi- 
cients for the sub-band of the following set; 

receiving a following set of spectral coefficients and 
subdividing the following set into sub-bands which cover 
the same frequency range as the sub-bands of the current 
set; 

determining whether a spectral coefficient of the sub- 
band of the following set is erroneous; 

as reaction to the step of determining, if there is an 
erroneous spectral coefficient, using an estimated spec- 
tral coefficient instead of an erroneous spectral coeffi- 
cient of the following set so as to conceal the erroneous 
spectral coefficient of the following set; and 

processing the following set using the estimated spectral 
coefficient used in the step of using to obtain the fol- 
lowing set of audio sampled values. 

A method according to claim 10, wherein the spectral co- 
efficients of the encoded audio signal are entropy-coded 
and quantized, which includes the following steps before 
the step of receiving the current set or the following 
set : 

cancelling the entropy coding to obtain quantized spec- 
tral coefficients ; 
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requantizing the quantized spectral coefficients to ob- 
tain requantized spectral coefficients; 

and wherein the step of processing includes the following 
step : 

reverse transforming the following set using a transform 
algorithm which is inverse to the transform algorithm 
used for transforming to obtain the spectral coefficients 
of the encoded audio signal. 

A device for concealing an error in an encoded audio sig- 
nal, where the encoded audio signal has successive sets 
of spectral coefficients, where a set of spectral coeffi- 
cients is a spectral representation for a set of audio 
sampled values, with the following features: 

a unit for subdividing a current set of spectral coeffi- 
cients into at least two sub-bands with different fre- 
quency ranges, where one sub-band of the at least two 
sub-bands has at least two spectral coefficients; 

a unit for reverse transforming the spectral coefficients 
of the one sub-band to obtain a temporal representation 
of the at least two spectral coefficients of the one sub- 
band; 

a unit for performing a prediction using the temporal 
representation of the at least two spectral coefficients 
of the one sub-band to obtain an estimated temporal rep- 
resentation for a sub-band of a set following the current 
set, where the sub-band of the following set has the same 
frequency range as the sub-band of the current set; 
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a unit for forward transforming the estimated temporal 
representation to obtain at least two estimated spectral 
coefficients for the sub-band of the following set; 

a unit for determining whether a spectral coefficient of 
the sub-band of the following set is erroneous; and 

a unit for using an estimated spectral coefficient in- 
stead of an erroneous spectral coefficient of the follow- 
ing set so as to conceal the erroneous spectral coeffi- 
cient of the following set, 

13. A device for decoding an encoded audio signal which com- 
prises successive sets of spectral coef f icients, where a 
set of spectral coefficients is a spectral representation 
for a set of audio sampled values: 

a unit for receiving a current set of spectral coeffi- 
cients; 

a unit for subdividing a current set of spectral coeffi- 
cients into at least two sub-bands with different fre- 
quency ranges, where one sub-band of the at least two 
sub-bands has at least two spectral coefficients; 

a unit for reverse transforming the spectral coefficients 
of the one sub-band to obtain a temporal representation 
of the at least two spectral coefficients of the one sub- 
band; 

a unit for performing a prediction using the temporal 
representation of the at least two spectral coefficients 
of the one sub-band to obtain an estimated temporal rep- 
resentation for a sub-band of a set following the current 
set, where the sub-band of the following set has the same 
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frequency range as the sub-band of the current set; 

a unit for forward transforming the estimated temporal 
representation to obtain at least two estimated spectral 
coefficients for the sub-band of the following set; 

a unit for receiving a following set of spectral coeffi- 
cients and for subdividing the following set into sub- 
bands which cover the same frequency range as the sub- 
bands of the current set; 

a unit for determining whether a spectral coefficient of 
the sub-band of the following set is erroneous; 

a unit for using an estimated spectral coefficient in- 
stead of an erroneous spectral coefficient of the follow 
ing set so as to conceal the erroneous spectral coeffi- 
cient of the following set; and 

a unit for processing the following set using the esti- 
mated spectral coefficient to obtain the following set o 
audio sampled values. 



Method and Device for Error Concealment in an Encoded 
Audio Signal and Method and Device for Decoding an 
Encoded Audio Signal 



Abstract 



In a method for concealing an error in an encoded audio signal 
a set of spectral coefficients is subdivided into at least two 
sub-bands , whereupon the sub-bands are subjected to a reverse 
transform. A specific prediction is performed for each quasi 
time signal of a sub-band to obtain an estimated temporal rep- 
resentation for a sub-band of a set of spectral coefficients 
following the current set. A forward transform of the time 
signal of each sub-band provides estimated spectral coeffi- 
cients which can be used instead of erroneous spectral coeffi- 
cients of a following set of spectral coefficients, e.g. in 
order to conceal transmission errors. Transforming at the sub- 
band level provides independence from transform characteris- 
tics such as block length, window type and MDCT algorithm 
while at the same time preserving spectral processing for er- 
ror concealment. Thus the spectral characteristics of audio 
signals can also be taken into account during error conceal- 
ment . 
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^4) Bezeichnung: VERFAHREN UND VORRICHTUNG ZUM VERSCHLEIERN EINES FEHLERS IN EINEM CODIERTEN AU- 
DIOSIGNAL UND VERFAHREN UND VORRICHTUNG ZUM DECODIEREN EINES CODIERTEN AUDIOSIG- 
m NALS 

l r 7 y Abstract 

A method for error concealment in an encoded audio signal, whereby a set of spectral 
coefficients are divided into at least two sub-bands(14), whereupon said sub-bands undergo 
reverse transformation (16). A specific prediction is performed (18) for each quasi- time 
signal of a sub-band in order to obtain an estimated temporal representation for a sub-band 
of a set of spectral coefficients following on from a real set. Forward transformation (20) of 
the time signal of each sub-band provides estimated spectral coefficients that can be used (28) 
instead of defective spectral coefficients of a subsequent set of spectral coefficients in order 
to conceal transmission errors, for example. Sub-band transformation provides independence 
from transformation characteristics such as frame length, window type or MDCT algorithm, 
while at the same time ensuring that spectral processing is maintained for error concealment, 
whereby the spectral characteristics of audio signals can also be taken into account during 
said error concealment 

(57) Zusammenfassung 

Bei einem Verfahren zum Verschleiem ernes Fehlers in einem codterten Audiosignal 
wird ein Satz von Spektralkoeffizienten in mindestens zwei Subbander unterteilt (14), 
woraufhin die Subbander einer Rilckwartstransformation unterzogen werden (16). Fiir 
jedes Quasi-Zeitsignal eines Subbandes wird eine eigene Pradiktion durchgefuhrt (18), uni eine geschatzte zeitliche Darstellung fur 
ein Subband eines auf den aktuellen Satz folgenden Satzes von Spektralkoeffizienten zu erhalten. Eine Vorwartstransformation (20) 
des Zeitsignals jedes Subbandes liefert geschatzte Spektralkoeffizienten, die anstatt fehlerhafter Spektralkoeffizienten eines folgenden 
Satzes von Spektralkoeffizienten verwendet werden konnen (28), urn beispielsweise Ubertragungsfehler zu verschleiem. Durch das 
subbandmafiige Transformieren wird einerseits Unabhangigkeit von Transformationseigenschaften, wie z. B, Blocklange, Fenstertyp Oder 
MDCT-Algorithmus erreicht, wahrend andererseits eine spektralmaSige Verarbeitung fur die Fehlerverschleierung gewahrt bleibt. Damit 
konnen spektrale Eigenschaften von Audiosignalen auch bei der Fehlerverschleierung beriicksichtigt werden. 
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Description 

The present invention relates to the encoding and decoding of 
audio signals and in particular to error concealment in digi- 
tal encoded audio signals. 

As a result of the increasingly widespread use of modern audio 
encoders and the corresponding audio decoders, which operate 
according to one of the MPEG standards, the transmission of 
encoded audio signals over radio networks or line-based net- 
works such as the internet has already become very important. 
The transmission channel involved in the transmission of en- 
coded audio signals by means of digital radio or over line- 
based networks is not ideal, which can result in encoded audio 
signals being adversely affected during the transmission. The 
decoder is therefore confronted with the question of how to 
deal with transmission errors, i.e. how these transmission er- 
rors are to be "concealed". The objective of error concealment 
is to manipulate transmission errors in such a way as to im- 
prove the subjective auditory sensation arising from such an 
error-afflicted decoded audio signal. 

Many error concealment methods are already known. The simplest 
type of error concealment is that of "muting". When a decoder 
recognizes that data are missing or are erroneous, it inter- 
rupts the reproduction. The missing data are thus replaced by 
a zero signal. In this way the decoder is prevented from issu- 
ing sounds which, due to a transmission error, would be found 
too loud or disconcerting. Because of psychoacoustic effects, 
however, the resulting sudden fall in the signal energy and 
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its sudden rise when the decoder issues error-free data again 
is found disconcerting. 

Another known method which avoids the sudden fall and subse- 
quent rise in the signal energy is that of data repetition. If 
e.g. one or more blocks of audio data are missing, part of the 
data last transmitted are repeated in a loop until error-free, 
i.e. intact, audio data are available again. This method pro- 
duces disturbing artefacts, however. If only short parts of 
the audio signal are repeated, the repeated signal sounds me- 
chanical whatever the original signal may have been like, hav- 
ing a basic frequency equal to the repetition frequency. If 
longer parts are repeated, certain echo effects arise which 
are also found disturbing. 

In block-oriented transform encoders/decoders that employ a 
spectral representation of a temporal audio signal, the possi- 
bility would also exist of performing a spectral value predic- 
tion in the case of erroneous audio data. If it is established 
that spectral values in a block are erroneous, these spectral 
values can be predicted, i.e. estimated, on the basis of the 
spectral values of a preceding frame or a number of preceding 
frames. The predicted spectral values correspond within cer- 
tain limits to the erroneous spectral values if the audio sig- 
nal is relatively steady, i.e. if the audio signal is not sub- 
ject to any very fast changes in the signal envelope. If e.g. 
a method employing the MPEG AAC standard (ISO/IEC 13818-7 
MPEG-2 Advanced Audio Coding) is considered, a normal block or 
frame of encoded audio data has 1024 spectral values. For the 
method of spectral value prediction 1024 parallel operating 
predictors will therefore be needed in the decoder so that, if 
a complete frame is lost, all the spectral values can be pre- 
dicted. 



A disadvantage of this method is the relatively high computa- 
tional effort, which makes a real-time decoding of a received 
multimedia or audio data signal impossible at present. 

A further important disadvantage of this method results from 
the transform algorithm, namely the modified discrete cosine 
transform (MDCT) , which is used. It is generally known that 
the MDCT algorithm does not provide an ideal Fourier spectrum 
but a "spectrum" which deviates from an ideal Fourier spec- 
trum. Investigations have shown that a sine time function 
e.g., which has a Fourier spectrum with a single spectral line 
at the frequency of the sine function, has an MDCT "spectrum" 
which, while it has a dominant spectral coefficient at the 
frequency of the sine function, also has in addition further 
spectral coefficients at other frequency values. Furthermore, 
the height of an MDCT "spectrum" of a sine function does not 
remain the same from one frame to another but varies from 
frame to frame. Another fact is that the MDCT transform is not 
strictly energy conserving. What can be stated, therefore, is 
that, while the MDCT transform works exactly in conjunction 
with an inverse MDCT transform, the MDCT spectrum differs con- 
siderably from a Fourier spectrum. A spectral value prediction 
of MDCT spectral coefficients has thus shown itself to be 
inadequate when high precision is required. 

A further disadvantage of spectral value prediction, particu- 
larly in connection with modern audio coding methods, is that 
modern audio coding methods use different window lengths or 
window shapes. To prevent the quantization noise arising from 
the quantization of the MDCT spectral coefficients being 
"smeared" over a long block, i.e. the occurrence of pre- 
echoes, when there are rapid changes (transients or "attacks") 
in the audio signal to be encoded, modern transform encoders 
use short windows for transient audio signals, i.e. audio sig- 
nals with "attacks", to increase the temporal resolution at 
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the expense of the frequency resolution- This means, however, 
that for a spectral value prediction both the window length 
and the window shape (in addition there are transition windows 
to initiate windowing from short to long blocks and vice 
versa) must be constantly taken into account, which also in- 
creases the complexity of the spectral value prediction and 
would greatly affect the computational efficiency. 

DE 40 34 017 Al relates to a method for detecting errors in 
the transmission of frequency coded digital signals. From the 
frequency coefficients or previous and, in some cases, future 
frames, an error function is formed on the basis of which the 
occurrence of an error can be detected. An erroneous frequency 
coefficient is no longer included in the evaluation of subse- 
quent frames. 

DE 197 35 675 Al discloses a method for concealing errors in 
an audio data stream. The spectral energy of a subgroup of in- 
tact audio data is calculated. After producing a pattern for 
substitute data using the spectral energy calculated for the 
subgroup of intact audio data, substitute data for erroneous 
or missing audio data corresponding to the subgroup are gener- 
ated according to the pattern. 

It is the object of the present invention to provide precise 
and flexible error concealment for audio signals which can be 
implemented with limited computational effort. 

This object is achieved by a method for error concealment ac- 
cording to claim 1 and a device for error concealment accord- 
ing to claim 12. 

A further object of the present invention is to provide error- 
tolerant and flexible decoding of audio signals. 
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This object is achieved by a method for decoding an encoded 
audio signal according to claim 10 and a device for decoding 
an encoded audio signal according to claim 13. 

The present invention is based on the finding that the disad- 
vantages of the spectral value prediction, which reside in the 
dependence on the transform algorithm which is used and in the 
dependence on the window shape and block length, can be 
avoided by performing error concealment by means of a predic- 
tion which functions in the "quasi" time domain. To this end a 
set of spectral values which preferably corresponds to a long 
block or a number of short blocks is subdivided into sub- 
bands. A sub-band of the current set of spectral coefficients 
can then undergo a reverse transform so as to obtain a time 
signal corresponding to the spectral coefficients of the sub- 
band. To generate estimated values for a subsequent set of 
spectral coefficients, a prediction is performed on the basis 
of the time signal of this sub-band. 

It should be noted that this prediction takes place in the 
quasi time domain since the temporal signal on the basis of 
which the prediction is performed is simply the time signal of 
one sub-band of the encoded audio signal and not the time sig- 
nal of the whole spectrum of the audio signal. The time signal 
generated by prediction is subjected to a forward transform to 
obtain estimated, i.e. predicted, spectral coefficients for 
the sub-band of the following set of spectral coefficients. If 
it now established that there are one or more erroneous spec- 
tral coefficients in the following set of spectral coeffi- 
cients, the erroneous spectral coefficients can be replaced by 
the estimated, i.e. predicted, spectral coefficients. 

Compared to the pure spectral value prediction, the method ac- 
cording to the present invention for error concealment re- 
quires less computational effort since, as the spectral coef- 
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ficients have been grouped together, predictions now have to 
be performed only for each sub-band and no longer for each 
spectral coefficient. Furthermore, the method according to the 
present invention provides a high degree of flexibility since 
the characteristics of the signals to be processed can be 
taken into account. 

The noise substitution according to the present invention 
works particularly well for tonal signals. It has been discov- 
ered, however, that tonal signal portions are more likely to 
appear in the lower-frequency range of the spectrum of an au- 
dio signal, while the higher-frequency signal portions are 
more likely to be unsteady, i.e. noisy. In terms of the pre- 
sent description, "noisy signal portions" are signal portions 
which are far from steady. These noisy signal portions do not 
have to represent noise in the classical sense, however, but 
simply rapidly changing user signals. 

To enable the computational effort to be reduced still fur- 
ther, it is possible with the present invention to subject 
only the lower-frequency signal portions to a prediction 
whereas higher-frequency signal portions are not processed at 
all. In other words, it is possible to subject only the low- 
est/lower sub-band (s) to a reverse transform, a prediction and 
a forward transform. 

This characteristic of the present invention, in contrast to a 
complete transforming of the whole audio signal into the time 
domain and a prediction of the whole temporal audio signal 
from block to block using a so-called "long-term" predictor, 
constitutes a considerable advantage, since according to the 
present invention the advantages of prediction in the time do- 
main are combined with the advantages of spectral decomposi- 
tion. Only with spectral decomposition is it possible to take 
account of audio signal characteristics which depend on the 
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frequency. The number of sub-bands generated from the subdivi- 
sion of the set of spectral coefficients is arbitrary. If only 
two sub-bands are chosen, the advantage of considering the to- 
nality already manifests itself in the lower frequency range 
of the audio signal. If on the other hand many sub-bands are 
chosen, the predictor in the quasi time domain will have a 
relatively short length such that its delay doesn't become too 
large. Since the individual sub-bands are preferably processed 
in parallel, an embodiment of the present invention using a 
hard-wired integrated circuit would require a plurality of 
predictor circuits in parallel. 

If the present invention is employed in connection with a 
transform encoder which uses different block lengths, the ad- 
vantage results that the predictor itself is independent of 
block length and window shape. In addition, due to the reverse 
transform, the dependence on the transform algorithm used, ex- 
plained above in relation to the MDCT, is eliminated. Further- 
more, the concept according to the present invention for error 
concealment furnishes estimated spectral coefficients which, 
due to the reverse transform, the prediction in the time do- 
main and the forward transform, have the right phase, i.e. 
there are no phase jumps in the time signal resulting from a 
predicted spectral coefficient in relation to a time signal of 
a preceding intact set of spectral coefficients. As a result 
tonal signals can be substituted for erroneous or missing 
signal portions so well that a normal listener does not even 
realize in most cases that an error has occurred. 

Finally, the method according to the present invention is par- 
ticularly suited for combination with an error concealment 
technique described in DE 197 35 675 Al, which is suitable for 
the substitution of noisy signal portions. If tonal signal 
portions of a missing block are concealed by means of the 
method according to the present invention, and if noisy signal 
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portions are combined by means of the known method which has 
just been cited, which is based on an energy similarity be- 
tween substituted data and intact data, completely missing 
blocks can be concealed to such an extent as to be practically 
inaudible for a normal listener. 



Preferred embodiments of the present invention are described 
in detail below making reference to the enclosed drawings, in 
which 



Fig. 1 shows a decoder having an error concealment unit ac- 
~ N cording to the present invention; 

Fig. 2 shows a detailed block diagram of the error conceal- 
ment unit of Fig. 1; 

Fig. 3 shows a detailed block diagram of the error conceal- 
ment unit of Fig. 1 which also provides noise substitu- 
tion and which works according to the prediction gain; 

Fig. 4 shows a flowchart for the method for error concealment 
according to the present invention; 

Fig. 5 shows a detailed block diagram of a preferred embodi- 
ment of the error concealment unit for an MPEG-2 AAC 
decoder; 

Fig. 6 shows a detailed block diagram of the predictor of 
^""^^Fig. 5; and 

Fig. 7 shows a schematic representation of the block struc- 
ture according to the AAC standard. 



Fig. 1 shows a block diagram of a decoder according to a pre- 
ferred embodiment of the present invention. The decoder block 
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diagram shown in Fig. 1 corresponds essentially to the MPEG-2 
AAC decoder as defined in the standard MPEG-2 AAC 13818-7. The 
encoded audio signal is first fed into a bit stream demulti- 
plexer 100 in order to separate spectral data and side infor- 
mation. The Huffman coded spectral coefficients are then fed 
into a Huffman decoder 200 so as to obtain quantized spectral 
values from the Huffman code words. The quantized spectral 
values are then fed into an inverse quantizer 300 and the re- 
spective scale factor bands are then multiplied by appropriate 
scale factors. The decoder according to the present invention 
can incorporate a plurality of additional functional units 
following the inverse quantizer 300, e.g. a middle/side stage, 
a predictor stage, a TNS stage, etc., as specified in the 
standard. 

According to a preferred embodiment of the present invention 
the decoder includes an error concealment unit 500 which imme- 
diately precedes a synthesis filter bank 400 and which func- 
tions according to the present invention and which ensures 
that the effects of transmission errors in the encoded audio 
signal fed into the bit stream demultiplexer 100 can be miti- 
gated or made completely inaudible. In other words, the error 
concealment unit 500 ensures that transmission errors are con- 
cealed, i.e. that they are not or are only faintly audible in 
a temporal audio signal at the output of the synthesis filter 
bank. 

Fig. 2 shows a general block diagram of the error concealment 
unit 500. This includes a reverse transform unit 502, a unit 
504 for generating estimated values and a forward transform 
unit 506. Both the reverse transform unit 502 and the forward 
transform unit 506 can be controlled according to the current 
block type via a block type line 508. The error concealment 
unit 500 also includes a parallel branch which enables the 
spectral coefficients on the input side to be routed directly 
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from the input to the output bypassing the reverse transform 
unit 502, the unit for generating estimated values 504 and the 
forward transform unit 506. This parallel branch contains a 
time delay stage 510 so as to ensure that estimated spectral 
coefficients for a subsequent block which appear behind the 
forward transform unit 506 arrive at an error selection unit 
512 simultaneously with "real", possibly erroneous spectral 
coefficients for the subsequent block, so that it is possible 
to replace any erroneous spectral coefficients in the real 
spectral coefficients for the subsequent block by estimated 
spectral coefficients for the subsequent block. This spectral 
value replacement is represented in Fig. 2 by a switch symbol 
512. It should be noted that the error replacement unit 512 
can operate on a spectral value level, or on a block or set 
level. Depending on the requirements, it can also operate on 
the sub-band level. The subsequent set of spectral coeffi- 
cients, wherein any originally erroneous spectral coefficients 
have been replaced by estimated spectral coefficients, i.e. 
wherein errors have been concealed, thus appears at the output 
of the error replacement unit 512. 

It should be pointed out here that the block diagram shown in 
Fig. 2 represents only a part of the error concealment unit 
500. This representation has however been chosen for reasons 
of clarity. As will be described in more detail in Fig. 5 with 
reference to a preferred embodiment of the present invention, 
the circuit shown in Fig. 2 is preceded by a unit for subdi- 
viding into sub-bands. As a counterpart thereto, the error re- 
placement unit 512 is followed by a unit for cancelling the 
subdivision into sub-bands so that the filter bank 400 (Fig. 
1) receives a "normal" set of spectral coefficients without 
noticing anything about the preceding error concealment. The 
error concealment unit 500 {Fig. 1) thus includes a plurality 
of the circuits described with reference to Fig. 2, namely one 
circuit per sub-band. The parallel circuits are connected on 
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the input side by the unit for subdividing and on the output 
side by the unit for cancelling the subdivision, as will be 
described in detail later. 

It has already been pointed out that modern transform encoders 
use short windows so as to increase the temporal resolution in 
the event of transients in an audio signal which is to be en- 
coded. Here it is usually the case that the number of temporal 
sampled values or the number of spectral coefficients in a 
long window or block is an integral multiple of the number of 
temporal sampled values or the number of spectral coefficients 
in a short window or block. An advantage of the present inven- 
tion is that the unit 504 for generating estimated values can 
operate independently of the transform, the block length and 
the window type which are used. Both the reverse transform 
unit 502 and the forward transform unit 506 are therefore con- 
trolled according to the block type so that the same number of 
temporal scanned values is always presented to or emerges from 
the unit 504 for generating estimated values. 

This property will now be illustrated further by making use of 
Fig. 7 to represent the situation for MPEG-2 AAC. Fig. 7 has a 
time axis 700 in terms of which the extent of a long block 702 
is represented. A long block comprises 2048 sampled values, 
resulting in 1024 spectral coefficients if the windows overlap 
by 50% as is known. Background details of the modified dis- 
crete cosine transform (MDCT) which is used and window over- 
lapping are to be found in the already cited standard. In Fig. 
7 eight short blocks 704 are also depicted, each of which has 
256 sampled values, again resulting in 128 spectral coeffi- 
cients due to the 50% overlap. For reasons of clarity, the 
overlapping of the short blocks and the overlapping of the 
long block with a preceding long block or with a preceding or 
subsequent start or stop window have not been shown in Fig. 7. 
However, it is clear from Fig. 7 that the number of spectral 



12 



coefficients in a long block is equal to eight times the num- 
ber of spectral coefficients in a short block. Put another 
way, a long block encompasses the same time duration of the 
audio signal as do eight short blocks. 

As is shown in Fig. 2, the reverse transform unit 502 is con- 
trolled via the block type line 508 in such a way that it per- 
forms eight successive reverse transforms of the spectral co- 
efficients in the corresponding sub-bands of short blocks and 
arranges the resulting quasi time signals serially next to one 
another so as to provide the unit 504 for generating estimated 
values with a time signal of a certain length. As a counter- 
part to this, the forward transform unit 506 will also perform 
eight successive forward transforms on the values which are 
issued serially by the unit 504 for generating estimated val- 
ues. This "operating cycle" thus ensures that in the case of 
short blocks the same number of spectral coefficients is out- 
put as in the case of long blocks. The spectral coefficients 
which are output by the error concealment unit 500 in an "op- 
erating cycle" are termed a set of estimated spectral coeffi- 
cients in the sense of the present invention. On the grounds 
of practicability the number of spectral coefficients in a set 
is the same as the number of spectral coefficients in a long 
block and the number of spectral coefficients in eight short 
blocks. It is obvious that other ratios between long and short 
block can be chosen, e.g. 2, 4 or 16. Normally the situation 
will be such that the number of spectral coefficients in a 
long block will be divisible by the number of spectral coeffi- 
cients in a short block. Should this not be so for some rea- 
son, however, the number of spectral coefficients in a set 
would be equal to the least common multiple of long and short 
blocks so as to achieve independence from the block type at 
the predictor level, i.e. in the unit 504 for generating esti- 
mated values. 
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Fig. 3, which represents a preferred development of the error 
concealment unit of Fig. 2, will now foe considered. An impor- 
tant feature here is that the error concealment unit has been 
provided with a noise replacement unit 514 which, in place of 
the forward transform unit 506, can be connected to the error 
replacement unit via a noise replacement switch 518 depending 
on a prediction gain signal 516. The noise replacement unit 
514 operates according to the method described in DE 197 35 
675 Al so as to approximate noisy signal content. Since noisy 
signal content is involved, the phase of the spectral coeffi- 
cients is no longer considered but simply the energy of a num- 
ber of spectral coefficients in a subgroup. Depending on the 
energy in a subgroup of the last intact audio data, the noise 
replacement unit 514 generates a corresponding subgroup of 
spectral coefficients, the energy in the subgroup of generated 
spectral coefficients equalling the energy of the correspond- 
ing subgroup of the preceding spectral coefficients or being 
derived from it. The phases of the spectral coefficients gen- 
erated in the noise replacement process are, however, speci- 
fied randomly. 

The noise replacement switch 518 is controlled by a prediction 
gain signal 516. In general the prediction gain depends on the 
way the output signal of the unit 504 for generating estimated 
values relates to the input signal. If it is found that the 
output signal in a sub-band is substantially the same as the 
input signal, it can be assumed that the audio signal in this 
sub-band is relatively steady, i.e. tonal. If, on the other 
hand, the output signal of the predictor differs markedly from 
the input signal, it can be assumed that the audio signal in 
this sub-band is relatively unsteady, i.e. atonal or noisy. In 
this case a noise replacement will provide better results than 
a prediction since noisy signals cannot per se be reliably 
predicted. The noise replacement switch 518 could, for exam- 
ple, be so controlled that it connects the forward transform 
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unit 506 to the error replacement unit 512 when the prediction 
gain exceeds a certain threshold and connects the noise re- 
placement unit 514 to the error replacement unit 512 when the 
prediction gain does not exceed this threshold, thus combining 
the two substitution methods in an optimal way. 

The method of noise substitution according to the present in- 
vention will now be considered in more detail making reference 
to Fig. 4. First, a current set of spectral coefficients is 
received (10) - For reasons of clarity it is assumed in Fig. 4 
that the current set of spectral coefficients consists en- 
tirely of intact spectral coefficients or has already been 
subjected to a error concealment method as shown in Fig. 2 or 
Fig. 3. On the one hand the current set of spectral coeffi- 
cients is processed by the filter bank 400 (Fig. 1) and output 
e.g. to a loudspeaker (12). On the other hand the current set 
of spectral coefficients is used to predict or estimate a sub- 
sequent set of spectral coefficients. To achieve this accord- 
ing to the present invention the current set of spectral coef- 
ficients is subdivided into sub-bands (14). In the case of a 
long block the subdivision into sub-bands is effected by gen- 
erating just one sub-band with a corresponding frequency range 
for each set. In the case of short blocks the current set of 
spectral coefficients will consist of a plurality of succes- 
sive complete spectra. Then, in step 14, corresponding sub- 
bands are generated for each complete spectrum, i.e. a plural- 
ity of sub-bands for each set of spectral coefficients. 

After subdivision into sub-bands a reverse transform is per- 
formed for each sub-band (16) . In the case of long blocks, 
where the number of spectral coefficients in a block is equal 
to the number of spectral coefficients in a set, a single re- 
verse transform is performed for each sub-band prior to the 
prediction 18. In the case of short blocks several reverse 
transforms corresponding to the sub-bands of each "short" 
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spectrum are performed before a prediction 18 is effected for 
all the sub-bands together. 

The prediction 18 takes place in the quasi time domain, i.e. 
for each sub-band "time" signal, so as to obtain an estimated 
sub-band time signal for the subsequent set. This estimated 
quasi time signal is then subjected to a forward transform 20, 
again once only for a long block and N times for short blocks, 
N being the ratio of the number of spectral coefficients of a 
long block to the number of spectral coefficients of a short 
block. 

After step 20 estimated spectral coefficients are available 
for each sub-band. In a step 22 the subdivision introduced in 
step 14 is revoked again so that a subsequent set of spectral 
coefficients is obtained after step 22. 

In a step 24 the subsequent set of spectral coefficients is 
received by the decoder. This set undergoes error detection 2 6 
in order to establish whether one spectral coefficient, sev- 
eral spectral coefficients or all spectral coefficients of the 
subsequent set are erroneous. The error detection is effected 
in a way which is known to persons skilled in the art, e.g. by 
checking the CRC checksum (CRC = Cyclic Redundancy Code) over 
a block. If it is found that a checksum that is calculated on 
the basis of the transmitted data differs from the checksum 
transmitted with the data, the estimated spectral coefficients 
generated by step 22 can be adopted instead of the spectral 
coefficients of the erroneous block. The erroneous spectral 
coefficients are thus replaced by the estimated spectral coef- 
ficients (28) . Finally the error-concealed spectral coeffi- 
cients of the subsequent set are processed so as to be able to 
output the temporal sampled values (30) . 
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The flowchart of Fig. 4 essentially represents a snapshot of 
the processing which takes place from one set of spectral co- 
efficients to the next set of spectral coefficients. If the 
flowchart of Fig. 4 is implemented it is obvious that e.g. 
only a single filter bank 400 (Fig. 1) is used to perform the 
steps 12 and 30. Equally, it is obvious that only a single 
unit is needed to receive the current set of spectral coeffi- 
cients and to receive the subsequent set of spectral coeffi- 
cients to implement the steps 10 and 24. Temporal synchronic- 
ity for the steps 10 and 24 in a device which implements the 
method according to the present invention is ensured by the 
time delay stage 510 in the parallel branch (Fig. 2). 

Fig. 5 shows a more detailed representation of the general 
block diagram of Fig. 2 for the example of an MPEG-2 AAC 
transform encoder featuring the error concealment unit 500 ac- 
cording to the present invention. As has already been ex- 
plained with reference to Fig. 2, the error concealment unit 
500 (Fig. 1) includes a unit 520 for subdividing the blocks of 
spectral coefficients into, preferably, 32 sub-bands. In the 
case of long blocks each sub-band has 32 spectral coeffi- 
cients. Since the sub-bands of the short blocks span the same 
frequency range, each sub-band has 4 spectral coefficients in 
the case of short blocks. A subdivision of a complete spectrum 
into sub-bands of the same size is preferred on the grounds of 
simplicity, though a subdivision into unequal sub-bands would 
also be possible, e.g. to reflect the psychoacoustical fre- 
quency groups* Each sub-band is then subjected to an inverse 
modified discrete cosine transform. In the case of long blocks 
the IMDCT is performed once and receives 32 input values. In 
the case of short blocks eight successive IMDCTs are per- 
formed, each with 4 of the spectral coefficients, so that 32 
quasi time sampled values again result at the output. These 
are then passed on to the predictor 504, which in turn gener- 
ates 32 estimated quasi time sampled values which are trans- 
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formed by the MDCT 506. In the case of long blocks a single 
MDCT is performed with 32 temporal values, whereas in the case 
of short blocks eight successive MDCTs are performed, each 
having 4 sampled values. Although only one branch for the 0-th 
sub-band is shown in Fig. 5, it should be noted that an iden- 
tical branch exists for each sub-band if all the sub-bands are 
of the same length. If the sub-bands are of different lengths, 
the orders of the IMDCT or MDCT are adapted accordingly. For 
the purposes of a practical implementation an obvious choice 
is parallel processing. Obviously, however, serial processing 
of the sub-bands is also possible, if sufficient storage ca- 
pacity is available. The output values of the MDCT 506 for 
each sub-band are fed to a unit 522 for reversing the subdivi- 
sion, i.e. into an inverse subdivision unit, so as to output 
an estimated set of spectral values for the preferred embodi- 
ment at the AAC MDCT level. 

Fig. 6 shows a further detailed representation of the predic- 
tor 504. The heart of the predictor 504 in the preferred em- 
bodiment is a so-called LMSL predictor 504a with a length of n 
= 32. Details of the LMSL predictor can be found in the book 
^Adaptive Signal Processing", Bernard Widrow, Samuel Stearns, 
Prentice-Hall, 1995, p. 99 ff. The LMSL predictor 504a is pre- 
ceded by a time delay stage 504b. The predictor 504 also in- 
cludes a parallel-series converter 504c on the input side and 
a series-parallel converter 504d on the output side. It also 
has a prediction gain calculator 504e which compares the out- 
put signal of the predictor 504a with the input signal in or- 
der to establish whether a steady signal or an unsteady signal 
has been processed. On the output side the prediction gain 
calculator 504e supplies the prediction gain signal 516, which 
is used to control the switch 518 (Fig. 3) so as to employ ei- 
ther predicted spectral coefficients or spectral coefficients 
gained by noise substitution for the purposes of error con- 
cealment. In its implementation as LMSL predictor the predic- 
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tor 504 also includes two switches 504f and 504g, which have 
two switch settings. The switch setting AA 1" applies when the 
spectral coefficients of the subsequent block are error-free 
and the switch setting "2" applies when the spectral coeffi- 
cients of the subsequent set are erroneous. Fig. 6 shows the 
case where the spectral coefficients are erroneous. In this 
case a reference signal with a value of 0 is fed into the pre- 
dictor at the switch 504g instead of the input signal. In the 
case of error-free spectral coefficients (switch setting M l" 
of the switch 504g) , on the other hand, the output values of 
the parallel-series converter are fed into the LMSL predictor 
from below. 

If the error concealment method according to the present in- 
vention is used in connection with an AAC encoder, the pre- 
ferred option is to use the corresponding transform algorithms 
(MDCT or IMDCT) for all the forward and reverse transforms. 
For error concealment it is not, however, necessary that the 
same transform method is employed for the reverse or forward 
transform as was used when encoding the audio signal to form 
the spectral coefficients. 

Due to the subdivision of the spectrum into sub-bands and due 
to the individual transforms for each sub-band, frequency-time 
domain transforms of lower order than the frequency resolution 
are used appropriately for each sub-band. As a result special 
estimated values for tonal signal portions are generated in 
the intermediate level by means of the predictor. Time- 
frequency domain transforms of lower order than the original 
frequency resolution are used appropriately as forward trans- 
form/synthesis, the same order being chosen as for the fre- 
quency-time domain transform which is used. Thus error con- 
cealment according to the present invention provides flexibil- 
ity through using advance knowledge of the spectral properties 
of audio signals and also independence from the transform 



method used in the encoder through the generation of estimated 
values in the quasi time signal , i.e. not at the spectral co- 
efficient level. If the prediction in the quasi time domain is 
used to replace tonal signal portions and if the noise re- 
placement is used for noisy spectral portions, errors for a 
large class of audio signals can be concealed to such an ex- 
tent that, even in the case of complete block loss, there is 
practically no audible disturbance. Trials have shown that, 
for not too critical test signals, normal listeners, i.e. un- 
trained test listeners, have heard irregularities in the audio 
signal only in one case out of 10 even when there has been 
complete block loss. 
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Claims 

A method for concealing an error in an encoded audio sig 
nal, where the encoded audio signal has successive sets 
of spectral coefficients, where a set of spectral coeffi 
cients is a spectral representation for a set of audio 
sampled values , comprising the following steps: 

subdividing (14) a current set of spectral coefficients 

into at least two sub-bands with different frequency 

ranges, where one sub-band of the at least two sub-bands 
has at least two spectral coefficients; 

reverse transforming (16) the spectral coefficients of 
the one sub-band to obtain a temporal representation of 
the at least two spectral coefficients of the one sub- 
band; 

performing (18) a prediction using the temporal represen 
tation of the at least two spectral coefficients of the 
one sub-band to obtain an estimated temporal representa- 
tion for a sub-band of a set following the current set, 
where the sub-band of the following set has the same fre 
quency range as the sub-band of the current set; 

forward transforming (20) the estimated temporal repre- 
sentation to obtain at least two estimated spectral coef 
ficients for the sub-band of the following set; 

determining (26) whether a spectral coefficient of the 
sub-band of the following set is erroneous; and 

as reaction to the step of determining, if there is an 
erroneous spectral coefficient, using (28) an estimated 
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spectral coefficient instead of an erroneous spectral co- 
efficient of the following set so as to conceal the erro- 
neous spectral coefficient of the following set. 

2. A method according to claim 1, wherein the one sub-band 
that is processed in the step of reverse transforming 
(16) has low-frequency spectral coefficients and the 
other of the at least two sub-bands has higher-frequency 
spectral coefficients . 

3. A method according to claim 1 or 2, wherein the number of 
spectral coefficients in a set of spectral coefficients 
is equal to the number of spectral coefficients in a 
block (702) of the first length and is N times the number 
of spectral coefficients in a block (704) of the second 
length, and wherein N blocks (704) of the second length 
follow each other, where 

the step of subdividing (14) is performed in such a way 
that the sub-bands of the blocks of the first length have 
the same frequency ranges as the sub-bands of the blocks 
of the second length, so that the number of spectral co- 
efficients of a sub-band of the block of the first length 
is equal to N times the number of spectral coefficients 
of the corresponding sub-band of the block of the second 
length; 

the step of reverse transforming (16) is performed in 
succession for each corresponding sub-band of the N 
blocks of the second length to obtain a temporal repre- 
sentation of the spectral coefficients of the correspond- 
ing sub-bands of the N blocks of the second length; 

the step of performing (18) a prediction is effected with 
the temporal representation of all the corresponding sub- 



22 



bands of the N blocks of the second length; and 

the step of forward transforming (20) is performed suc- 
cessively for each corresponding sub-band of the N blocks 
of the second length. 

4. A method according to one of the preceding claims, 
wherein a plurality of sub-bands is generated in the step 
of subdividing (14) such that all the sub-bands together 
form the spectral representation of the encoded audio 
signal in a set of spectral coefficients. 

5. A method according to one of the preceding claims, 
wherein the following step is performed after the step of 
determining (26) whether a spectral coefficient of a sub- 
band is erroneous: 

determining (504e) whether the spectral coefficient 
represents a tonal portion of the uncoded audio signal by 
comparing the spectral coefficient with the corresponding 
estimated spectral coefficient; 

if the spectral coefficient is found to be tonal, using 
the estimated spectral coefficient, and, if the spectral 
coefficient is found to be non-tonal, performing a noise 
substitution (514) for an erroneous spectral coefficient 
of the following set. 

6. A method according to one of the claims 3 to 5, wherein 
the spectral coefficients are MDCT coefficients, the 
length of a set corresponds to the length of a long block 
and has 1024 MDCT coefficients, while a set of spectral 
coefficients comprises eight short-length blocks, each 
with 128 MDCT coefficients, and wherein 32 sub-bands, 
each with 32 MDCT coefficients for a long block or each 



23 



with 4 MDCT coefficients for a short block, are formed in 
the step of subdividing. 

7. A method according to one of the preceding claims, 
wherein an adaptive back-coupled predictor (504a) , pref- 
erably an LMSL predictor, is used in the step of perform- 
ing (18) the prediction. 

8. A method according to one of the preceding claims, 
wherein the transform algorithm which forms the basis of 
the encoded audio signal is the same transform algorithm 
that is used in the step of reverse transforming (16) and 
in the step of forward transforming (20) . 

A method according to one of the preceding claims, 
wherein the transform algorithm which is used in the step 
of reverse transforming (16) is the exact inverse of the 
transform algorithm that is used in the step of forward 
transforming (20) . 

A method for decoding an encoded audio signal which com- 
prises successive sets of spectral coefficients, wherein 
a set of spectral coefficients is a spectral representa- 
tion for a set of audio sampled values: 

receiving (10) a current set of spectral coefficients; 

subdividing (14) a current set of spectral coefficients 
into at least two sub-bands with different frequency 
ranges, where one sub-band of the at least two sub- 
bands has at least two spectral coefficients; 

reverse transforming (16) the spectral coefficients of 
the one sub-band to obtain a temporal representation of 
the at least two spectral coefficients of the one sub- 
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band; 

performing (18) a prediction using the temporal represen 
tation of the at least two spectral coefficients of the 
one sub-band to obtain an estimated temporal representa- 
tion for a sub-band of a set following the current set, 
where the sub-band of the following set has the same fre 
quency range as the sub-band of the current set; 

forward transforming (20) the estimated temporal repre- 
sentation to obtain at least two estimated spectral coef 
ficients for the sub-band of the following set; 

receiving (24) a following set of spectral coefficients 
and subdividing the following set into sub-bands which 
cover the same frequency range as the sub-bands of the 
current set; 

determining (26) whether a spectral coefficient of the 
sub-band of the following set is erroneous; 

as reaction to the step of determining, if there is an 
erroneous spectral coefficient, using (28) an estimated 
spectral coefficient instead of an erroneous spectral co 
efficient of the following set so as to conceal the erro 
neous spectral coefficient of the following set; and 

processing (30) the following set using the estimated 
spectral coefficient used in the step of using (28) to 
obtain the following set of audio sampled values. 

11. A method according to claim 10, wherein the spectral co- 
efficients of the encoded audio signal are entropy-coded 
and quantized, which includes the following steps before 
the step of receiving (10) the current set or the follow 
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ing set; 

cancelling (200) the entropy coding to obtain quantized 
spectral coefficients ; 

requantizing (300) the quantized spectral coefficients to 
obtain requantized spectral coefficients; 

and wherein the step of processing includes the following 
step: 

reverse transforming (400) the following set using a 
transform algorithm which is inverse to the transform al- 
gorithm used for transforming to obtain the spectral co- 
efficients of the encoded audio signal. 

12. A device for concealing an error in an encoded audio sig- 
nal, where the encoded audio signal has successive sets 
of spectral coefficients, where a set of spectral coeffi- 
cients is a spectral representation for a set of audio 
sampled values, with the following features: 

a unit (520) for subdividing (14) a current set of spec- 
tral coefficients into at least two sub-bands with dif- 
ferent frequency ranges, where one sub-band of the at 
least two sub-bands has at least two spectral coeffi- 
cients; 

a unit (502) for reverse transforming (16) the spectral 
coefficients of the one sub-band to obtain a temporal 
representation of the at least two spectral coefficients 
of the one sub-band; 

a unit (504) for performing (18) a prediction using the 
temporal representation of the at least two spectral co- 



efficients of the one sub-band to obtain an estimated 
temporal representation for a sub-band of a set following 
the current set, where the sub-band of the following set 
has the same frequency range as the sub-band of the cur- 
rent set; 

a unit (506) for forward transforming (20) the estimated 
temporal representation to obtain at least two estimated 
spectral coefficients for the sub-band of the following 
set; 

a unit for determining (26) whether a spectral coeffi- 
cient of the sub-band of the following set is erroneous; 
and 

a unit (512) for using (28) an estimated spectral coeffi- 
cient instead of an erroneous spectral coefficient of the 
following set so as to conceal the erroneous spectral co- 
efficient of the following set. 

A device for decoding an encoded audio signal which com- 
prises successive sets of spectral coefficients, where a 
set of spectral coefficients is a spectral representation 
for a set of audio sampled values: 

a unit (100) for receiving (10) a current set of spectral 
coefficients; 

a unit (520) for subdividing (14) a current set of spec- 
tral coefficients into at least two sub-bands with dif- 
ferent frequency ranges, where one sub-band of the at 
least two sub-bands has at least two spectral coeffi- 
cients; 



a unit (502) for reverse transforming (16) the spectral 
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coefficients of the one sub-band to obtain a temporal 
representation of the at least two spectral coefficients 
of the one sub-band; 

a unit (504) for performing (18) a prediction using the 
temporal representation of the at least two spectral co- 
efficients of the one sub-band to obtain an estimated 
temporal representation for a sub-band of a set following 
the current set, where the sub-band of the following set 
has the same frequency range as the sub-band of the cur- 
rent set; 

a unit (506) for forward transforming (20) the estimated 
temporal representation to obtain at least two estimated 
spectral coefficients for the sub-band of the following 
set ; 

a unit (502, 510) for receiving (24) a following set of 
spectral coefficients and for subdividing the following 
set into sub-bands which cover the same frequency range 
as the sub-bands of the current set; 

a unit for determining (26) whether a spectral coeffi- 
cient of the sub-band of the following set is erroneous; 

a unit (512) for using (28) an estimated spectral coeffi- 
cient instead of an erroneous spectral coefficient of the 
following set so as to conceal the erroneous spectral co- 
efficient of the following set; and 

a unit for processing (30) the following set using the 
estimated spectral coefficient to obtain the following 
set of audio sampled values. 
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DATE OF FILING 
DAY, MONTH, YEAR 


PRIORITY 
CLAIMED 
UNDER 35 U.S.C. 
SECTION 119 


\% Germany 


19921122.1 


07. May 1999 


YES 



POWER OF ATTORNEY 



Ijhereby appoint the practitioner(s) associated with the Customer Number provided below to prosecute this 
Implication and to transact all business in the Patent and Trademark Office connected therewith. 

N Customer No. 24283 



SEND CORRESPONDENCE TO DIRECT TELEPHONE CALLS TO: 

Customer No. 24283 Carl A. Forest 

303-379-1114 



DECLARATION 

I hereby declare that all statements made herein of my own knowledge are true and that all statements made 
on information and belief are believed to be true; and further that these statements were made with the 
knowledge that willful false statements and the like so made are punishable by fine or imprisonment, or 
both, under Section 1001 of Title 18 of the United States Code, and that such willful false statements may 
jeopardize the validity of the application or any patent issued thereon. 
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SIGNATURE(S) 



Inventor's signature 

Typed Inventor's Full Name: 

Date ^ 

Residence (City, Country) 
Post Office Address 



Pierre LAUBER 



Country of Citizenship Germany 



Nuernberg, Germany 

Rilkestrasse 30 

D-90419 Nuernberg, Germany 



"I T. 



Inventor's signature W ^ 

Typed Inventor's Full Name: Jgartin DIEXzT 

Date September 14, 2001 Country of .Citizenship 

Residence (City, Country) Nuernberg, Germany ^ ' 

JPpst Office Address Kleinreuther Weg 47 



fXTitizens* 



Nuernber g, 
Kleinreuther 
D-90408 Nuernberg, Germany 



Germany 



Inventor's signature 

HEyped Inventor's Full Name: 

Wate 

Residence (City, Country) 
Host Office Address 



Juergen HERRE 



Country of Citizenship 

Buckenhof, Germany 



Germany 



Am Eichengarten 11 



D-91054 Buckenhof, Germany 




Inventory signature 

Typed Inventor's Full Name: 
Date September 14, 2001 
Residence (City, Country) 
Post Office Address ____ 



jteinhold BOEHM 

Country of Citizenship 

Nuernberg, Germany DnS% 



Germany 



Etzlaubweg 12 



D-9Q469 Nuernberg, Germany 
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Inventor's signature 

Typed Inventor's Full Name: Ralph SPERSCHNEIDER 

Date Country of Citizenship Germany 

Residence (City, Country) Erlangen, Germany 

Post Office Address Donato-Polli-S trass e 42 

D-91056 Erlangen, Germany 



Inventor's signature 

Typed Inventor's Full Name: Daniel HOMM 



Date Country of Citizenship Germany 

Residence (City, Country) Erlangen « Germany 

Post Office Address Wichernstrasse 18 

C3 D-91052 Erlangen, Germany 
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